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Recently, several powerful tools for the reconstruction of stochastic differential equations from 
measured data sets have been proposed [e.g. Siegert et al., Physics Letters A 243, 275 (f998); 
Hurn et al., Journal of Time Series Analysis 24, 45 (2003)]. Efficient application of the methods, 
however, generally requires Markov properties to be fulfilled. This constraint typically seems to 
be violated on small scales, which frequently is attributed to physical effects. On the other hand, 
measurement noise such as uncorrelated measurement and discretization errors has large impacts on 
the statistics of measurements on small scales. We demonstrate, that the presence of measurement 
noise, likewise, spoils Markov properties of an underlying Markov process. This fact is promising for 
the further development of techniques for the reconstruction of stochastic processes from measured 
data, since limitations at small scales might stem from artificial noise sources rather than from 
intrinsic properties of the dynamics of the underlying process. Measurement noise, however, can be 
controlled much better than the intrinsic dynamics of the underlying process. 

PACS numbers: 02.50.Ga, 05.45.Tp 



I. INTRODUCTION 

Physical systems often are described by means of dy- 
namical systems defined by differential equations of first 
order in time. The knowledge of a single point in phase 
space is sufficient for precise prediction of the future evo- 
lution of the system. Starting from this initial condition, 
the equations of motion can be integrated - at least nu- 
merically. Some systems are very sensitive to the initial 
condition and therefore are associated with deterministic 
chaos. 

For complex systems, a deterministic description of- 
ten is not feasible due to the huge amount of degrees 
of freedom and their frequently unknown microscopic in- 
teractions. However, in many cases the individual pro- 
cesses act on two different time scales. The dynamics of 
the entire system then can be reduced to the dynamics 
of some macroscopic order parameters, that enslave the 
highly fluctuating microscopic degrees of freedom pj. In 
turn, the set of order parameters, x, obeys stochastic 
differential equations (SDEs). If the SDEs are of first or- 
der in time, trajectories likewise can be generated from 
one single initial state. The evolution then does not de- 
pend on properties of the trajectory prior to the initial 
point and, therefore, exhibits only a very restricted mem- 
ory. Realisations of particular trajectories sensitively de- 
pend on the fluctuating random forces, that are involved. 
However, considering an ensemble of realizations of the 
stochastic process, the Markovian property becomes ev- 
ident. 

In recent years, the analysis of stochastic time series 
has made great advances. Especially, the non-parametric 
reconstruction of the governing stochastic differential 
equation by means of the direct evaluation so drift and 
diffusion function has become a successful tool for ana- 



lyzing stochastic processes. A method, that initially was 
proposed by Siegert et al. 0, in the meantime has been 
applied to severalproblems in the field of finance 0] , life 
sciences 0, H, S H S 03 and turbulence [ll| . More- 
over, algorithms for the efficient application of maximum 
likelihood methods have been developed [HI, [l3[ . A brief 
overview over the estimation power of several methods 
can be found in [Til ]. Quite recently, an algorithm has 
been proposed, that combines the capabilities of the lat- 
ter methods [TH, [l6j]. However, the validity of Markov 
properties remains a crucial constraint for the efficient 
application of all these procedures on stationary time se- 
ries data. 

A close inspection of data sets generally indicates, that 
Markov properties are violated at small time differences. 
Typically, physical arguments arc accounted for this ef- 
fect, based on the fact that stochastic forces actually are 
correlated in time on small time differences. The aim 
of the present note is to study the influence of measure- 
ment noise on the Markov properties of measured data. 
We shall show that measurement noise as well interferes 
with and spoils the Markov properties. 

The paper is organized as follows. In the next sec- 
tion, some methods for verification of the Markov prop- 
erties of measured data sets are reconsidered. Section 
IIIII contains the basic arguments concerning the influ- 
ence of measurement noise on the transition probability 
density functions. Consequences of the central equation 
Q for the Markov properties will be made explicit by 
means of three limiting cases, that are discussed at the 
end of the section. In section IIV1 the general results of 
the former section are exemplified by means of two par- 
ticular examples. In detail, the impact of discretization 
noise on a purely deterministic system and the effects 
of uncorrelated measurement noise on a stochastic pro- 
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cess are investigated. We conclude with sectionfV] which 
summarises the main results of our investigations and 
comprises the consequences for standard tools for data 
analysis. 

II. VERIFICATION OF MARKOV PROPERTIES 

Multivariate joint probability density functions 
(PDFs) are of great importance for the analysis of 
measured time series x(t). In principle, they contain all 
information on the initial data set such as spatial and 
temporal evolution. The benefit from a probabilistic 
approach on the basis of high dimensional joint PDFs, 
however, generally is limited. 

The analysis substantially can be simplified, if the data 
set under consideration satisfies Markov properties. This 
circumstance is equivalent to the representation of all 
multivariate joint PDFs in products of single-conditioned 
PDFs, 

Pn (*^ro i*En— 1 ; ■ ■ ■ 7*^0) — P2 (^nl^Ti — l) 

X . . . x P 2 (xx\x ) Pi (x ) . (1) 

Here, Pi (xi) is a shorthand notation for the probabil- 
ity of being at time U in a small interval at Xi with 
U < ti + i Vi. In general, the latter transition PDFs fur- 
thermore explicitly depend on the times t n ,...,ta. 

Let us now assume the sample to be ergodic and sta- 
tionary in a sense, that ensemble averages can be car- 
ried out by means of time averages and the PDFs do 
not depend on time explicitly [l7j . Then, the Chapman- 
Kolmogorov equation [IT], [l8| 

P(Xi\Xi- 2 ) = J dXi-- L P(Xi\xi-x)P{x i -x\Xi-^) , 

(2) 

has to be fulfilled for any Markov process. This equation 
can be evaluated numerically for measured data sets. Al- 
though the validation of this equation is not sufficient for 
the validity of Markov properties, it has turned out to be 
a very robust criterion. 

Moreover, a direct comparison of the conditional prob- 
ability distributions P(x2|xi,xq) and P(x2|xi) has been 
used for validation of Markov properties. For Markovian 
data, these functions should coincidence for arbitrary val- 
ues of xq . An example for the application of this proce- 
dure by means of graphical inspection of the PDFs is de- 
picted in figure [TJ that has been prepared by Wachter et 
al. in connection with the study of the statistical proper- 
ties of hight profiles of gold surfaces [10J. Here, Wachter 
et al. investigated Markov properties of the transition 
PDFs for nested heigth increments in different scales. In 
the present case, Markov properties might be fulfilled for 
scales separated by Ar = 35 nm, whereas they evidently 
are violated at separation lenghts of Ar = 14 nm, as can 
be seen from inspection of figure [TJ It is evident, that 
the proper interpretation of the plot with respect to the 
Markov properties has to be quantified by introducing 



a certain measure for the distance of the two probabil- 
ity distributions. To this end the Wilcoxon test 0, [2(| 
can be applied in order to compare PDFs, that originate 
from samples of different size, and only makes few de- 
mands on the properties of the individual PDFs. The 
numerical implementation is straightforward, results for 
the present example e.g. are depicted in 10]. For a de- 
tailed descri ptio n of the Wilcoxon test we refer to the 
appendix of [211 ] . 

If the direct estimation Q of drift and diffusion func- 
tions from measured data sets is intended and the un- 
derlying process, therefore, is assumed to obey Langevin 
equations, an alternative method can be applied for in- 
spection of Markov properties. Once the estimation pro- 
cedure has been performed and an estimate for drift and 
diffusion functions is available, the character of the dy- 
namical noise can be determined from the sample. The 
presence of noise without any temporal and spatial cor- 
relations is a sufficient indication for compliance of the 
measured data set with Markov properties. This proce- 
dure e.g. is outlined and applied in 22]. It is certainly 
the most direct way to investigate Markov properties. 

III. IMPACT OF MEASUREMENT NOISE ON 
MARKOV PROPERTIES 

An ensemble of Markov processes x(t) is considered, 
that now is distorted by measurement noise For sim- 
plicity, the details are carried out for a one-dimensional 
process. Only three consecutive points xq, x\ and X2 
with Xi := x{ti) and U := to + it are investigated for 
this purpose. Since the statistics is assumed to be sta- 
tionary, this is sufficient for the current considerations. 
Henceforth, P x (xi+\\Xi) is a shorthand notation for the 
transition PDF of the variable x in the time increment 

T. 

Let us now assume, that the true process is hidden to 
the data analyst: Instead of the variable x(t), a perturbed 
variable y(t) is measured, that emerges from the initial 
process by means of the relation 

y(t)=x(t)+ax(t),t) . (3) 

Thereby, £(x(t),t) is a stochastic variable, that incorpo- 
rates systematic and non-systematic measurement errors. 
We further assume, that the deterministic contributions 
to the measurement error can be identified and the noise 
£ can be specified by 

£K*),i)=6(z(*)+U(*)) ■ (4) 

Here, £ ns incorporates non-systematic noise sources. For 
reasons of simplicity, we assume these errors to be inde- 
pendent of one another for consecutive measurements, 

(&»(* + r)£«.(t)>~*(r) . (5) 

On the other hand, £ s characterises deterministic, sys- 
tematic measurements errors, that have no explicit de- 
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FIG. 1: Example for the analysis of Markov properties by means of graphical inspection of transition PDFs, prepared by 
Wachter et al. [lo|. Test for Markov properties of Au film data for two different scale separations Ar = 14 nm (lhs) and 
35 nm (rhs), where Ar = 7-3 — V2 = r2 — ri . In both cases r2 = 169 nm. In each case a contour plot of conditional probabilities 
P(hi, ri I /ig, r2) (dashed lines) and P(hi, ri|/i2, r2; /i3=0, r3) (solid lines) is shown in the top panel. Contour levels differ by a 
factor of 10, with an additional level at p — 0.3. Below the top panels in each case, two one-dimensional cuts at /12 ~ ±<Too 
are shown with P(hi, n |/i2, T2) as dashed lines and P(hi, r\ |/i2, ^2; /i3=0, rz) as circles. From the deviations of the PDFs for 
Ar = 14 nm (lhs) it becomes evident, that Markov properties are not fulfilled in this case. They might, however, be valid for 
Ar = 35 nm (rhs). 



pendence on t. We would like to emphasize, that dis- 
cretization errors fall into this broad class, that are an 
intrinsic feature of any digital measurement procedure. 
While the former noise is uncorrelated, this assumption 
generally is violated for the latter noise source due to 
correlations in the variable x itself. 

The probability for the measurement of yi now solely 
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depends on the entangled variable xi and can be specified 
by means of the conditional probability {yi \xi). Hence, 
the conditional probability P y {y2\yi,yo) for the process 
y(t) can be calculated by means of its definition through 
joint probabilities. Application of the Markov properties 
of the underlying process x(t) finally yields 



p / I \ Py(y2,yi,yo) _ J dx 2 J dx x J dx P^(y2\x2)Pi(yi\xi)P^(yo\xo)P x (x 2 \x 1 )P x (x 1 \xo)P x (xo) 

y[V2\yi,yo)- Py{yuyo) - f d Xl f dx P 6 ( yi \xi)P 6 (y \x )P x (xi\x )P x (x ) U 



r 



In general, this expression deviates from the single con- 
ditioned PDF P v (y2\yi)- Therefore, noisy measurements 
on perfect Markov processes in general lose their Markov 
property due to the inexact measurement procedure. 

Referring to section |l] this means, that a single point 
from a noisy measurement on a Markov process, y(t n ), 
not in any case is sufficient for a proper prediction of 



the future dynamics of the measured data. This makes 
sense, since the intrinsic state of the system, x(t n ), hardly 
can be estimated from just one single measurement due 
to the measurement uncertainty. Rather, the considera- 
tion of a couple of noisy measurements, y(t$), . . . ,y(t n ), 
can enhance the accuracy of the predicted probability of 
y(tn+i)- 
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At least for three simple cases, expression (j6j) can be 
investigated analytically. 

First, Markov properties are retrieved for the trivial 
case P^(y\x) — S(y — x), where actually no measurement 
noise is present. 

Second, ((6]) can be evaluated for P x (xi+i\xi) = 
P x (xi+i). In this case, the entangled process itself does 
not show any correlations. Frequently, this approxi- 
mately is true for large time increments between individ- 
ual measurements. If so, the integrals disentangle and 
the noisy measurements themselves turn out to be inde- 
pendent of one another, P y {y2\yi, Vo) — Py(U2)- Thus, 
the measured variable y satisfies Markov properties. 

Third, noisy measurements can be considered, that 
sample the process much faster than the intrinsic dynam- 
ics of the entangled variable, x. Therefore, P x {x2\x±) = 
5(x2 — x±) is a reasonable approximation of the transi- 
tion PDF on consecutive measurements. Moreover, only 
purely non-systematic, Gaussian measurement noise with 
variance a 2 is taken into account. In this case, evaluation 



of expression © yields 



$ dx ° v / 5^75 exp 



3^ eX P 

( x o~^(»2+yi+»o)) 



(7) 



f dxa \h^Ti exp 



(*o _ i(f i+bo)) 



In the latter factor, two different convolutions occur 
in numerator and denominator: The stationary PDF 
Px{%o) is convoluted with Gaussian PDFs with differ- 
ent standard deviations, centred at the average value of 
Z/2)J/i; yo and yi,yo, respectively. Therefore, this expres- 
sion generally depends on the value yo and conflicts with 
Markov properties of y(t). We would like to emphasize, 
that the approximation of a persistent entangled process 
is feasible for fast but noisy measurements on rather slow 
processes. The current case reveals the loss of Markov 
properties on the very small time scales for these kind of 
measurements, that does not stem from its intrinsic dy- 
namics but, purely, from uncertainties during the mea- 
surement process. 



IV. 



EXAMPLES 



Let us now elucidate the findings of the latter sec- 
tion by means of two examples. First, the influence of 
discretization noise on the properties of a simple deter- 
ministic process is investigated. By construction, the 
violation of the Chapman-Kolmogorov equation can be 
demonstrated. 

Second, the influence of Gaussian measurement noise 
on the Markov properties of a stochastic process at rel- 
atively high time lag is considered. The effect of mea- 
surement noise becomes obvious from the inspection of 
conditional PDFs obtained by numerical integration of 
the Chapman-Kolmogorov equation @ . 




FIG. 2: Example of a process x(t) according to eqn. @, that 
is affected by strong discretization noise. Here, the faint line 
specifies the original process x(t), whereas the bold line de- 
picts the evolution y(t), that eventually is obtained from the 
measurement due to discretization errors. 



Influence of discretization noise on a 
deterministic process 



We consider the elementary process 
x(t + t) = x(to) exp [~7r] 



It is the general solution of the ordinary differential equa- 
tion x — —^x. Since the dynamics are of first order in 
time, the one dimensional process x(t) can be specified by 
one initial condition and, therefore, is Markovian. Due 
to the deterministic character, the conditional transition 
PDF for the variable x in the time interval r complies 
with 



P x (xi\xo,t) = 5 (xi - xoe 7r ) 



(9) 



The process apparently is not stationary, since no forcing 
is present. The conditional transition PDFs, however, 
do not depend on time explicitly. We now consider the 
statistics of an ensemble of measurements, whose initial 
positions x(to) of the individual processes are distributed 
according to P x (x). 

We assume that the exact intrinsic variable x is entan- 
gled due to discretization errors, that occur during an 
imaginary measurement procedure. Therefore, the ex- 
act, continuous variables x are mapped to a finite set of 
discrete variables Q — {loq, . . . , u> n } according to the rule 



V 



such that 



< x < LJ„ 



(10) 



Here, the intervals ,oj\ f\ and [uJ~ +1 ,Wi +1 \ associated 
with the variables u)i and Wi+i are connected to one an- 
other by the requirements ui^ = and ui^ < ojf. 
Moreover it is implied, that any measured value x can be 
mapped by means of (fTU)) . The interval [w^",u;+], thus, 
covers all values x(t) that are realised by any process un- 
der consideration at any time t. The discretization noise 
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can be specified in compliance with the notation of the 
latter section by the conditional PDF 



x e 
x 4 



[y- 
[y- 



< r 



(11) 



The effect of discretization noise on the initial variable 
x is illustrated in figure O As y only assumes discrete 



values uiq, , 



the normalisation of the latter PDF 



for any x is guaranteed by the equation 

yen 



(12) 




We now would like to demonstrate the loss of Markov 
properties due to the discretization of the signal. In prin- 
ciple, eqn. ^ directly could be evaluated numerically for 
the ensemble under consideration. However, in this case 
the invalidity of the Chapman-Kolmogorov equation ([2]) 
nicely can be utilized for this purpose. 

Analogous to eqn. ©, the transition PDF conditioned 
on a single point can be specified, 

Py(yi\yo,r) (13) 

_ / dxi J dx Q P i (y 1 \xi)P i (y Q \xQ)P x (x 1 \x ,T)P x (x ) 
J dx P^(y \x ,T)P x (x ) 

Application of the particular transition PDFs © and 
([TT]) yields 



P v (yi\yo,r) 



r xnin(y+ ,e JT y+) . , 

J max^y ,e^ T y 1 ) 

f v ° dx PM 



(14) 



If the process y(t) would obey Markov properties, the 
discrete version of the Chapman-Kolmogorov equation, 



P y (y2\yo, 2t) = Py{y2\yi,r)P y (yx\yo,r) 



(15) 



would have to be fulfilled for any choice of y2, yo and t. 
For y2=yo=y with y~ > and r = log (y + /y~) 
the invalidity of this equation is evident, if P x {x) > 
for x £ {y~,y + }: The left hand side of eqn. (fTS"]) van- 
ishes, whereas the sum on the right hand side involves 
the summand 



(y+/s/-)/2 



dx P x (x ) 



Jy- dx P x (x ) 



> 



(16) 



As the other summands are non-negative, the Chapman- 
Kolmogorov equation is violated for the process under 
consideration. Consequently, the distorted process y(t) 
does not comply with Markov properties any more. 



FIG. 3: Detail of a sample path of the stochastic process 
(|17[) for the parameters (7, D) = (0.75,0.1). The occurrence 
of distinct peaks is characteristic for multiplicative stochastic 
processes. 



B. Influence of measurement noise on a stochastic 
process 

The influence of Gaussian measurement noise on an 
one dimensional stochastic process with drift and diffu- 
sion functions 

DW(x) = x (^D- 7 log (j^yj (17a) 

D {2) {x) = Dx 2 (17b) 

is investigated. For further details on stochastic processes 
we refer to fl7l [H| . This process already has been dis- 
cussed in [16j | within the scope of an analytical example. 
Thereby, the following procedure for the exact simula- 
tion of a discrete sample of this process by means of the 
underlying Ornstcin-Uhlcnbeck process s(t) has been mo- 
tivated, 




(1 - e-^Ti 



(18a) 
(18b) 



Here, equation (|18b[) is the rule for the discrete simu- 
lation of an underlying Ornstein-Uhlcnbeck process s, 
where Ti are normally distributed independent random 
variables with variance 1. It is deduced from the transi- 
tion PDFs for the Ornstein-Uhlenbeck process, that ex- 
actly can be specified even for finite time lag r In 
this vein, discretization errors stemming from the stan- 
dard schemes for the numerical integration of SDEs [23[ 
are avoided. The starting value sq should be drawn from 
a Gaussian distribution with variance D/j, which is the 
stationary distribution of the process s. The desired pro- 
cess x is obtained from the process s by means of the 
nonlinear transform (|18ap . A sample process for param- 
eter set (7, -D) = (0.75,0.1) is depicted in figure[31 



0.5 1 1.5 2 2.5 3 3.5 4 

x(t) 



0.5 1 1.5 2 2.5 3 3.5 4 

x(t) 




0.5 1 1.5 2 0.5 1 1.5 2 

x(t+t) x(t+x) 



FIG. 4: Test for Markov properties for simulated samples A without measurement noise (lhs) and B with artificial Gaussian 
measurement noise with variance 2.25 ■ 10~ 2 (rhs), respectively. In the upper panels, the conditional transition PDFs for 
r = 0.1 (solid contour lines) are compared with the ones obtained for the same time increment by numerical integration of the 
Chapman-Kolmogorov equation © for transition PDFs for increment r/2 (dashed contour lines). Contour lines are placed at 
the levels 10, 1, 0.1 and 0.01. In the lower panels, a cross section of the transition PDF at x(t) = 1 is depicted. For reasons 
of clearness, circles have been added to the dashed lines corresponding to the data set B. Perfect coincidence of the PDFs is 
observed for A, whereas in case of B systematic deviations become evident. Consequently, Markov properties are spoiled by 
the artificial measurement noise of sample B. 



For the current example, time series A consisting of 
50 • 10 6 sample points with time increment r = 0.05 was 
simulated. A second series, B, was generated from se- 
ries A by addition of independent, identically distributed 
Gaussian random variables with variance 2.25 • 10~ 2 , 
that model noise stemming from non-systematic mea- 
surement errors. Both series A and B have been sub- 
jected to the same analyzing procedure: Conditional 
PDFs have been calculated from the for time increment 
r = 0.1. On the other hand these conditional PDFs 
have been calculated from conditional PDFs for the time 
lag r = 0.05 by means of numerical integration of the 
Chapman-Kolmogorov equation, ((2|). The results are ex- 
hibited in figure 2J In theory, these PDFs should coin- 
cidence with the former ones for Markovian processes. 
However, distinctive systematic deviations show up in 
presence of measurement noise, as can be seen from the 
analysis of data set B in figure |H Hence, the artificial 



measurement noise of time series B interferes with the 
Markov properties of the underlying time series A. 

The sets A and B correspond to the first and third 
limiting case of equation ([5]), respectively, that were dis- 
cussed at the end of section llHl The second case also can 
be investigated by means of the current example with an 
increased time lag r, such that exp(— yr) <C 1. Then, 
Markov properties are reobtained even in case of strong 
measurement noise. 



V. CONCLUSION 

The influence of different noise sources on the struc- 
ture of multivariate joint probability distribution func- 
tions has been investigated. In particular, the effects 
of noise on the sensitivity of transition probability den- 
sity functions to an additional, second condition has been 
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analysed. It turned out, that noise generally has impacts 
on these transition probability density functions and seri- 
ously interferes with Markov properties, even if they are 
fulfilled for the original, uncorrupted process. This fact 
is, in our opinion, counter-intuitive. 

The analysis of samples, that are affected by measure- 
ment noise, already for a long time is routine in applied 
sciences and industrial applications. Typically, Kalman 
filtering is applied fur this purpose [24J. For a recent re- 
view on this and other iterative techniques we refer to 
PH - Recently, Siefert et al. [26[ addressed this prob- 
lem from a dynamical systems' point of view. The in- 
tention was to extend the efficient non-parametric es- 
timation procedure proposed by Siegert and al. [1] to 
data suffering from measurement noise. In this context 
it could be shown, that intrinsic dynamical and external 
measurement noise in principle can be separated from 
one another, if the sampling frequency is sufficiently high 
whereas the amplitude of the measurement noise is weak. 



Following, Bottcher et al. succeeded in the efficient recon- 
struction of simple processes even in presence of strong 
measurement noise [27]. Although the latter work is 
based on eqn. (fT5| . the general problem of the vanish- 
ing Markov properties in presence of measurement noise 
could not be identified. This new point of view, however, 
involves a broad class of tools that are available for data 
analysis, since most tools rely on a finite embedding of 
the data. 

The new insights have consequences for future analysis 
of time series: The influence of measurement noise should 
be discussed for any individual method, that is applied 
for the analysis of time series. Explicitly, also effects 
stemming from discretization errors should be considered 
here. Eventually, methods might be applicable even to 
data sets, that until now could not be processed due to 
invalidity of Markov properties. This feature, however, 
might stem from artificial noise rather than from intrinsic 
properties of the dynamics of the underlying process. 
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